AI029
Reinforcement Learning: An Introduction
Multi-arm Bandits
Learning Objectives
- Define the k-armed bandit problem framework
- Evaluate the exploration-exploitation trade-off
- Implement epsilon-greedy and Upper Confidence Bound (UCB) action selection
- Analyze incremental update rules for action-value estimation
- Compare performance of various bandit algorithms in stationary and non-stationary environments